{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "JQsk1BKwZrBh"
   },
   "source": [
    "# Ungraded Lab: Predicting Sunspots with Neural Networks\n",
    "\n",
    "In the remaining labs for this week, you will move away from synthetic time series and start building models for real world data. In particular, you will train on the [Sunspots](https://www.kaggle.com/datasets/robervalt/sunspots) dataset: a monthly record of sunspot numbers from January 1749 to July 2018. You will first build a deep neural network here composed of dense layers. This will act as your baseline so you can compare it to the next lab where you will use a more complex architecture.\n",
    "\n",
    "Let's begin!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "2JHlrB12aeFD"
   },
   "source": [
    "## Imports\n",
    "\n",
    "You will use the same imports as before with the addition of the [csv](https://docs.python.org/3/library/csv.html) module. You will need this to parse the CSV file containing the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "56XEQOGknrAk"
   },
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "D-mNttnaagBH"
   },
   "source": [
    "## Utilities\n",
    "\n",
    "You will only have the `plot_series()` dataset here because you no longer need the synthetic data generation functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "sLl52leVp5wU"
   },
   "outputs": [],
   "source": [
    "def plot_series(x, y, format=\"-\", start=0, end=None, \n",
    "                title=None, xlabel=None, ylabel=None, legend=None ):\n",
    "    \"\"\"\n",
    "    Visualizes time series data\n",
    "\n",
    "    Args:\n",
    "      x (array of int) - contains values for the x-axis\n",
    "      y (array of int or tuple of arrays) - contains the values for the y-axis\n",
    "      format (string) - line style when plotting the graph\n",
    "      label (string) - tag for the line\n",
    "      start (int) - first time step to plot\n",
    "      end (int) - last time step to plot\n",
    "      title (string) - title of the plot\n",
    "      xlabel (string) - label for the x-axis\n",
    "      ylabel (string) - label for the y-axis\n",
    "      legend (list of strings) - legend for the plot\n",
    "    \"\"\"\n",
    "\n",
    "    # Setup dimensions of the graph figure\n",
    "    plt.figure(figsize=(10, 6))\n",
    "    \n",
    "    # Check if there are more than two series to plot\n",
    "    if type(y) is tuple:\n",
    "\n",
    "      # Loop over the y elements\n",
    "      for y_curr in y:\n",
    "\n",
    "        # Plot the x and current y values\n",
    "        plt.plot(x[start:end], y_curr[start:end], format)\n",
    "\n",
    "    else:\n",
    "      # Plot the x and y values\n",
    "      plt.plot(x[start:end], y[start:end], format)\n",
    "\n",
    "    # Label the x-axis\n",
    "    plt.xlabel(xlabel)\n",
    "\n",
    "    # Label the y-axis\n",
    "    plt.ylabel(ylabel)\n",
    "\n",
    "    # Set the legend\n",
    "    if legend:\n",
    "      plt.legend(legend)\n",
    "\n",
    "    # Set the title\n",
    "    plt.title(title)\n",
    "\n",
    "    # Overlay a grid on the graph\n",
    "    plt.grid(True)\n",
    "\n",
    "    # Draw the graph on screen\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dFJv45pDauS8"
   },
   "source": [
    "## Download and Preview the Dataset\n",
    "\n",
    "You can now download the dataset and inspect the contents. The link in class is from Laurence's repo but we also hosted it in the link below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "YwI-si5xyUkQ"
   },
   "outputs": [],
   "source": [
    "# Download the dataset\n",
    "!wget -nc https://storage.googleapis.com/tensorflow-1-public/course4/Sunspots.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "c0fAiMytrwPJ"
   },
   "source": [
    "Running the cell below, you'll see that there are only three columns in the dataset:\n",
    "1. untitled column containing the month number\n",
    "2. Date which has the format `YYYY-MM-DD`\n",
    "3. Mean Total Sunspot Number"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "d5W2auXKrhVh"
   },
   "outputs": [],
   "source": [
    "# Preview the dataset\n",
    "!head Sunspots.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "lbs-Y2SDsVaw"
   },
   "source": [
    "For this lab and the next, you will only need the month number and the mean total sunspot number. You will load those into memory and convert it to arrays that represents a time series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "NcG9r1eClbTh"
   },
   "outputs": [],
   "source": [
    "# Initialize lists\n",
    "time_step = []\n",
    "sunspots = []\n",
    "\n",
    "# Open CSV file\n",
    "with open('./Sunspots.csv') as csvfile:\n",
    "  \n",
    "  # Initialize reader\n",
    "  reader = csv.reader(csvfile, delimiter=',')\n",
    "  \n",
    "  # Skip the first line\n",
    "  next(reader)\n",
    "  \n",
    "  # Append row and sunspot number to lists\n",
    "  for row in reader:\n",
    "    time_step.append(int(row[0]))\n",
    "    sunspots.append(float(row[2]))\n",
    "\n",
    "# Convert lists to numpy arrays\n",
    "time = np.array(time_step)\n",
    "series = np.array(sunspots)\n",
    "\n",
    "# Preview the data\n",
    "plot_series(time, series, xlabel='Month', ylabel='Monthly Mean Total Sunspot Number')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "EUQE9RRoazC5"
   },
   "source": [
    "## Split the Dataset\n",
    "\n",
    "Next, you will split the dataset into training and validation sets. There are 3235 points in the dataset and you will use the first 3000 for training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "L92YRw_IpCFG"
   },
   "outputs": [],
   "source": [
    "# Define the split time\n",
    "split_time = 3000\n",
    "\n",
    "# Get the train set \n",
    "time_train = time[:split_time]\n",
    "x_train = series[:split_time]\n",
    "\n",
    "# Get the validation set\n",
    "time_valid = time[split_time:]\n",
    "x_valid = series[split_time:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RbuIOrb3a093"
   },
   "source": [
    "## Prepare Features and Labels\n",
    "\n",
    "You can then prepare the dataset windows as before. The window size is set to 30 points (equal to 2.5 years) but feel free to change later on if you want to experiment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "lJwUUZscnG38"
   },
   "outputs": [],
   "source": [
    "def windowed_dataset(series, window_size, batch_size, shuffle_buffer):\n",
    "    \"\"\"Generates dataset windows\n",
    "\n",
    "    Args:\n",
    "      series (array of float) - contains the values of the time series\n",
    "      window_size (int) - the number of time steps to include in the feature\n",
    "      batch_size (int) - the batch size\n",
    "      shuffle_buffer(int) - buffer size to use for the shuffle method\n",
    "\n",
    "    Returns:\n",
    "      dataset (TF Dataset) - TF Dataset containing time windows\n",
    "    \"\"\"\n",
    "  \n",
    "    # Generate a TF Dataset from the series values\n",
    "    dataset = tf.data.Dataset.from_tensor_slices(series)\n",
    "    \n",
    "    # Window the data but only take those with the specified size\n",
    "    dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)\n",
    "    \n",
    "    # Flatten the windows by putting its elements in a single batch\n",
    "    dataset = dataset.flat_map(lambda window: window.batch(window_size + 1))\n",
    "\n",
    "    # Create tuples with features and labels \n",
    "    dataset = dataset.map(lambda window: (window[:-1], window[-1]))\n",
    "\n",
    "    # Shuffle the windows\n",
    "    dataset = dataset.shuffle(shuffle_buffer)\n",
    "    \n",
    "    # Create batches of windows\n",
    "    dataset = dataset.batch(batch_size)\n",
    "    \n",
    "    # Optimize the dataset for training\n",
    "    dataset = dataset.cache().prefetch(1)\n",
    "    \n",
    "    return dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "9g5zSxZwZQt_"
   },
   "outputs": [],
   "source": [
    "# Parameters\n",
    "window_size = 30\n",
    "batch_size = 32\n",
    "shuffle_buffer_size = 1000\n",
    "\n",
    "# Generate the dataset windows\n",
    "train_set = windowed_dataset(x_train, window_size, batch_size, shuffle_buffer_size)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_Lhpi42ta5yP"
   },
   "source": [
    "## Build the Model\n",
    "\n",
    "The model will be 3-layer dense network as shown below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "AclfYY3Mn6Ph"
   },
   "outputs": [],
   "source": [
    "# Build the model\n",
    "model = tf.keras.models.Sequential([\n",
    "    tf.keras.Input(shape=(window_size,)),\n",
    "    tf.keras.layers.Dense(30, activation=\"relu\"), \n",
    "    tf.keras.layers.Dense(10, activation=\"relu\"),\n",
    "    tf.keras.layers.Dense(1)\n",
    "])\n",
    "\n",
    "# Print the model summary\n",
    "model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uyT14hQOa97V"
   },
   "source": [
    "## Tune the Learning Rate\n",
    "\n",
    "You can pick a learning rate by running the same learning rate scheduler code from previous labs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "GXiqsZQ1y4nD"
   },
   "outputs": [],
   "source": [
    "# Set the learning rate scheduler\n",
    "lr_schedule = tf.keras.callbacks.LearningRateScheduler(\n",
    "    lambda epoch: 1e-8 * 10**(epoch / 20))\n",
    "\n",
    "# Initialize the optimizer\n",
    "optimizer = tf.keras.optimizers.SGD(momentum=0.9)\n",
    "\n",
    "# Set the training parameters\n",
    "model.compile(loss=tf.keras.losses.Huber(), optimizer=optimizer)\n",
    "\n",
    "# Train the model\n",
    "history = model.fit(train_set, epochs=100, callbacks=[lr_schedule])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "xJFAB1NTzGJV"
   },
   "outputs": [],
   "source": [
    "# Define the learning rate array\n",
    "lrs = 1e-8 * (10 ** (np.arange(100) / 20))\n",
    "\n",
    "# Set the figure size\n",
    "plt.figure(figsize=(10, 6))\n",
    "\n",
    "# Set the grid\n",
    "plt.grid(True)\n",
    "\n",
    "# Plot the loss in log scale\n",
    "plt.semilogx(lrs, history.history[\"loss\"])\n",
    "\n",
    "# Increase the tickmarks size\n",
    "plt.tick_params('both', length=10, width=1, which='both')\n",
    "\n",
    "# Set the plot boundaries\n",
    "plt.axis([1e-8, 1e-3, 0, 100])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "oY4mnUpNbFdc"
   },
   "source": [
    "## Train the Model\n",
    "\n",
    "Once you've picked a learning rate, you can rebuild the model and start training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Ngna3zR4znJd"
   },
   "outputs": [],
   "source": [
    "# Reset states generated by Keras\n",
    "tf.keras.backend.clear_session()\n",
    "\n",
    "# Build the Model\n",
    "model = tf.keras.models.Sequential([\n",
    "    tf.keras.Input(shape=(window_size,)),\n",
    "    tf.keras.layers.Dense(30, activation=\"relu\"), \n",
    "    tf.keras.layers.Dense(10, activation=\"relu\"),\n",
    "    tf.keras.layers.Dense(1)\n",
    "])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "7FKXvEYxzuoc"
   },
   "outputs": [],
   "source": [
    "# Set the learning rate\n",
    "learning_rate = 2e-5\n",
    "\n",
    "# Set the optimizer \n",
    "optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate, momentum=0.9)\n",
    "\n",
    "# Set the training parameters\n",
    "model.compile(loss=tf.keras.losses.Huber(),\n",
    "              optimizer=optimizer,\n",
    "              metrics=[\"mae\"])\n",
    "\n",
    "# Train the model\n",
    "history = model.fit(train_set,epochs=100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "b1iV2t8ibIka"
   },
   "source": [
    "## Model Prediction\n",
    "\n",
    "Now see if the model generates good results. If you used the default parameters of this notebook, you should see the predictions follow the shape of the ground truth with an MAE of around 15. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "djn06Tri2B6_"
   },
   "outputs": [],
   "source": [
    "def model_forecast(model, series, window_size, batch_size):\n",
    "    \"\"\"Uses an input model to generate predictions on data windows\n",
    "\n",
    "    Args:\n",
    "      model (TF Keras Model) - model that accepts data windows\n",
    "      series (array of float) - contains the values of the time series\n",
    "      window_size (int) - the number of time steps to include in the window\n",
    "      batch_size (int) - the batch size\n",
    "\n",
    "    Returns:\n",
    "      forecast (numpy array) - array containing predictions\n",
    "    \"\"\"\n",
    "\n",
    "    # Generate a TF Dataset from the series values\n",
    "    dataset = tf.data.Dataset.from_tensor_slices(series)\n",
    "\n",
    "    # Window the data but only take those with the specified size\n",
    "    dataset = dataset.window(window_size, shift=1, drop_remainder=True)\n",
    "\n",
    "    # Flatten the windows by putting its elements in a single batch\n",
    "    dataset = dataset.flat_map(lambda w: w.batch(window_size))\n",
    "    \n",
    "    # Create batches of windows\n",
    "    dataset = dataset.batch(batch_size).prefetch(1)\n",
    "    \n",
    "    # Get predictions on the entire dataset\n",
    "    forecast = model.predict(dataset, verbose=0)\n",
    "    \n",
    "    return forecast"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "GaC6NNMRp0lb"
   },
   "outputs": [],
   "source": [
    "# Reduce the original series\n",
    "forecast_series = series[split_time-window_size:-1]\n",
    "\n",
    "# Use helper function to generate predictions\n",
    "forecast = model_forecast(model, forecast_series, window_size, batch_size)\n",
    "\n",
    "# Drop single dimensional axis\n",
    "results = forecast.squeeze()\n",
    "\n",
    "# Plot the results\n",
    "plot_series(time_valid, (x_valid, results))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "13XrorC5wQoE"
   },
   "outputs": [],
   "source": [
    "# Compute the MAE\n",
    "print(tf.keras.metrics.mae(x_valid, results).numpy())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "YqMGN9S5veN2"
   },
   "source": [
    "## Wrap Up\n",
    "\n",
    "In this lab, you built a relatively simple DNN to forecast sunspot numbers for a given month. We encourage you to tweak the parameters or train longer and see the best results you can get. In the next lab, you will build a more complex model and you evaluate if the added complexity translates to better or worse results."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "C4_W4_Lab_2_Sunspots_DNN.ipynb",
   "private_outputs": true,
   "provenance": [
    {
     "file_id": "https://github.com/https-deeplearning-ai/tensorflow-1-public/blob/adding_C4/C4/W4/ungraded_labs/C4_W4_Lab_3_DNN_only.ipynb",
     "timestamp": 1641292790865
    }
   ]
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0rc1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
